No. The model suggests that roughly 5% of the data should fall outside the range of these bars. That looks compatible with the data we have.
Note: These points are not outliers – they fit the pattern that the model anticipates for the data (at least as far as this plot diagnoses) just fine. The model says there should be some data out there.
In problems 2-6, the point was to draw conclusions from a comparison of the pictures presented. 2. Run the following examples in R. Compare the plots produced and comment the big idea(s) illustrated by this comparison.
```r
library(CalvinBayes)
BernGrid("H", resolution = 4, prior = triangle::dtriangle)
BernGrid("H", resolution = 10, prior = triangle::dtriangle)
BernGrid("H", prior = 1, resolution = 100, geom = geom_col)
BernGrid("H", resolution = 100,
prior = function(p) abs(p - 0.5) > 0.48, geom = geom_col)
```
<img src="Redoing-sols-A_files/figure-html/BernGrid-01-1.png" width="80%" /><img src="Redoing-sols-A_files/figure-html/BernGrid-01-2.png" width="80%" /><img src="Redoing-sols-A_files/figure-html/BernGrid-01-3.png" width="80%" /><img src="Redoing-sols-A_files/figure-html/BernGrid-01-4.png" width="80%" />
The posterior can only be non-zero if both of the following are true:
In each example we see that flipping a head drops the posterior to 0 for a coin that only gives tails because it is impossible to flip a head with such a coin. The last example shows that if the prior is 0, the posterior will also be 0. If our prior says something is impossible, no amount of data will change our minds. (So generally, we are cautious about having the prior be 0 unless something is truly impossible, not just unlikely or unexpected.)
Run the following examples in R. Compare the plots produced and comment the big idea(s) illustrated by this comparison.
library(CalvinBayes)
BernGrid("TTHT", prior = triangle::dtriangle)
BernGrid("TTHT",
prior = function(x) triangle::dtriangle(x)^0.1)
BernGrid("TTHT",
prior = function(x) triangle::dtriangle(x)^10)
The only thing changing in these examples is the prior, the data are the same each time. The more concentrated the prior, the more it shapes the posterior.
Run the following examples in R. Compare the plots produced and comment the big idea(s) illustrated by this comparison.
library(CalvinBayes)
dfoo <- function(p) {
0.02 * dunif(p) +
0.49 * triangle::dtriangle(p, 0.1, 0.2) +
0.49 * triangle::dtriangle(p, 0.8, 0.9)
}
BernGrid(c(rep(0,13), rep(1,14)), prior = triangle::dtriangle)
BernGrid(c(rep(0,13), rep(1,14)), resolution = 1000, prior = dfoo)
Comparing the two priors we see that the large concentrations of the foo prior near 0.15 and 0.85 result in a posterior that allocates some credibility there even though the data are suggesting a value near 0.5. Since the triangle distribution does not assign especially large credibility there, those “bumps” near 0.15 and 0.85 are not present in the posterior when we use a triangle prior.
Also note that the posterior bumps for the foo prior are shifted toward the center (because the data are near 50% heads).
Run the following examples in R. Compare the plots produced and comment the big idea(s) illustrated by this comparison.
library(CalvinBayes)
dfoo <- function(p) {
0.02 * dunif(p) +
0.49 * triangle::dtriangle(p, 0.1, 0.2) +
0.49 * triangle::dtriangle(p, 0.8, 0.9)
}
BernGrid(c(rep(0, 3), rep(1, 3)), prior = dfoo)
BernGrid(c(rep(0, 10), rep(1, 10)), prior = dfoo)
BernGrid(c(rep(0, 30), rep(1, 30)), prior = dfoo)
BernGrid(c(rep(0, 100), rep(1, 100)), prior = dfoo)
With little data, the prior has great sway over the posterior. With more and more data, the influence of the prior becomes less and less.
Run the following examples in R and compare them to the ones in the previous exercise. What do you observe?
library(CalvinBayes)
dfoo <- function(p) {
0.02 * dunif(p) +
0.49 * triangle::dtriangle(p, 0.1, 0.2) +
0.49 * triangle::dtriangle(p, 0.8, 0.9)
}
BernGrid(c(rep(0, 3), rep(1, 4)), prior = dfoo)
BernGrid(c(rep(0, 4), rep(1, 3)), prior = dfoo)
BernGrid(c(rep(0, 10), rep(1, 11)), prior = dfoo)
BernGrid(c(rep(0, 11), rep(1, 10)), prior = dfoo)
BernGrid(c(rep(0, 30), rep(1, 31)), prior = dfoo)
BernGrid(c(rep(0, 31), rep(1, 30)), prior = dfoo)
A very informative/opinionated prior can turn small differences in a small data set into large differences in the posterior because it pushes the posterior toward one of the regions where the prior is large. But with more and more data, this effect becomes less and less pronounced.
Create a function in R that converts Fahrenheit temperatures to Celsius temperatures. [Hint: \(C = (F-32) \cdot 5/9\).]
What you turn in should show
c(-40, 32, 98.6, 212) as the input.to_celsius <- function(temp) {
(temp - 32) * 5 / 9
}
to_celsius(c(-40, 32, 98.6, 212))
## [1] -40 0 37 100
function() to create a function in R that is equivalent to p(x).gf_function() to plot the function on the interval \([0, 1]\).integrate()).What is the largest value of \(p(x)\)? At what value of \(x\) does it occur? Is it a problem that this value is larger than 1?
Hint: differentiation might be useful.
f <- function(x) {
(x >= 0) * (x <= 1) * 6 * x * (1 - x)
}
gf_function(f, xlim = c(0, 1))
integrate(f, 0, 1)
## 1 with absolute error < 1.1e-14
\(\frac{df}{dx} = 6 - 12x = 0\) when \(x = 0.5\) and the second derivative is negative, so the maximum value occurs when \(x = 0.5\).
\(B\) is continuous with kernel \(f(x) = x^2(1-x)\) on \([0, 1]\).
Hint: first figure out what the pdf is.
x <- 1:4
p <- function(x) x/10
# Expected value is sum of values times probabilities
sum(x * p(x))
## [1] 3
k <- function(x) x^2 * (1 - x) * (x >= 0) * (x <= 1)
integrate(k, 0, 1)
## 0.08333333 with absolute error < 9.3e-16
f <- function(x) k(x) / integrate(k, 0 , 1)$value
integrate(f, 0, 1)
## 1 with absolute error < 1.1e-14
gf_function(f, xlim = c(0, 1))
# expected value
integrate(function(x) x * f(x), 0, 1)
## 0.6 with absolute error < 6.7e-15
In Bayesian inference, we will often need to come up with a distribution that matches certain features that correspond to our knowledge or intuition about a situation. Find a normal distribution with a mean of 10 such that half of the distribution is within 3 of 10 (ie, between 7 and 13).
Hint: use qnorm() to determine how many standard deviations are between 10 and 7.
library(mosaic)
z <- qnorm(0.75); z
## [1] 0.6744898
sigma <- (13 - 10) / z; sigma
## [1] 4.447807
xpnorm(c(7, 13), mean = 10, sd = sigma)
##
## If X ~ N(10, 4.448), then
## P(X <= 7) = P(Z <= -0.6745) = 0.25 P(X <= 13) = P(Z <= 0.6745) = 0.75
## P(X > 7) = P(Z > -0.6745) = 0.75 P(X > 13) = P(Z > 0.6745) = 0.25
##
## [1] 0.25 0.75
School children were surveyed regarding their favorite foods. Of the total sample, 20% were 1st graders, 20% were 6th graders, and 60% were 11th graders. For each grade, the following table shows the proportion of respondents that chose each of three foods as their favorite.
From that information, construct a table of joint probabilities of grade and favorite food.
Are grade and favorite food independent? Explain how you ascertained the answer.
| grade | Ice cream | Fruit | French fries |
|---|---|---|---|
| 1st | 0.3 | 0.6 | 0.1 |
| 6th | 0.6 | 0.3 | 0.1 |
| 11th | 0.3 | 0.1 | 0.6 |
conditional <-
rbind(c(0.3, 0.6, 0.1), c(0.6, 0.3, 0.1), c(0.3, 0.1, 0.6))
conditional
## [,1] [,2] [,3]
## [1,] 0.3 0.6 0.1
## [2,] 0.6 0.3 0.1
## [3,] 0.3 0.1 0.6
marginal <-
cbind(c(0.2, 0.2, 0.6), c(0.2, 0.2, 0.6), c(0.2, 0.2, 0.6))
marginal
## [,1] [,2] [,3]
## [1,] 0.2 0.2 0.2
## [2,] 0.2 0.2 0.2
## [3,] 0.6 0.6 0.6
joint <- conditional * marginal; joint
## [,1] [,2] [,3]
## [1,] 0.06 0.12 0.02
## [2,] 0.12 0.06 0.02
## [3,] 0.18 0.06 0.36
sum(joint)
## [1] 1
Grade and favorite food are not independent.
conditional == marginal
## [,1] [,2] [,3]
## [1,] FALSE FALSE FALSE
## [2,] FALSE FALSE FALSE
## [3,] FALSE FALSE TRUE
all(conditional == marginal)
## [1] FALSE
Three cards are placed in a hat. One is black on both sides, one is white on both sides, and the third is white on one side and black on the other. One card is selected at random from the three cards in the hat and placed on the table.
The top of the card is black.
There are other ways to do this, but here is a way that looks like a Bayesian update of a prior using a likelihood.
expand.grid(
card = c("BB", "BW", "WW")
) %>%
mutate(
prior = 1/3, # all equally likely to be drawn
likelihood = c(1, 0.5, 0), # chance of see black on top
posterior = prior * likelihood,
posterior = posterior / sum(posterior)
)
## card prior likelihood posterior
## 1 BB 0.3333333 1.0 0.6666667
## 2 BW 0.3333333 0.5 0.3333333
## 3 WW 0.3333333 0.0 0.0000000
The three cards from the previous problem are returned to the hat. Once again a card is pulled and placed on the table, and once again the top is black. This time a second card is drawn and placed on the table. The top side of the second card is white.
Grid <-
expand.grid(
card1 = 0:2, # number of black sides on card
card2 = 0:2 # number of black sides on card
) %>%
mutate(
prior0 = ifelse( card1 == card2, 0, 1),
prior = prior0 / sum(prior0),
likelihood = card1 / 2 * (2 - card2) / 2,
posterior = prior * likelihood,
posterior = posterior / sum(posterior)
)
Grid %>%
DT::datatable() %>%
DT::formatRound(
columns = ~ prior + likelihood + posterior,
digits = 3)
# answers
# a: card 1 has two black sides
Grid %>% filter(card1 == 2) %>% pull(posterior) %>% sum()
## [1] 0.75
# b: second card has 1 black side
Grid %>% filter(card2 == 1) %>% pull(posterior) %>% sum()
## [1] 0.25
# c: first card has 2 black sides, second has 1 black side
Grid %>% filter(card1 == 2, card2 == 1) %>%
pull(posterior) %>% sum()
## [1] 0.25
Suppose there are two species of panda, A and B. Without a special blood test, it is not possible to tell them apart. But it is known that half of pandas are of each species and that 10% of births from species A are twins and 20% of births from species B are twins.
If a female panda has twins, what is the probability that she is from species A?
If the same panda later has another set of twins, what is the probability that she is from species A?
A different panda has twins and a year later gives birth to a single panda. What is the probability that this panda is from species A?
Let \(A\) and \(B\) be the the events that a panda is species A or B. Let \(T_i\) be the event that the \(i\)th birth is twins.
\[\begin{align*} P(A \mid T_1) &= \frac{P(A \cap T_1)}{P(T_1)} \\ &= \frac{P(A) \cdot P(T_1 \mid A)}{P(A \cap T_1) + P(B \cap T_1)} \\ &= \frac{P(A) \cdot P(T_1 \mid A)}{P(A) \cdot P(T_1 \mid A) + P(B) \cdot P(T_1 \mid B)} \\ &= \frac{0.5 \cdot 0.1}{0.5 \cdot 0.1 + 0.5 \cdot 0.2} = \frac{2}{3} \end{align*}\]
\[\begin{align*} P(A \mid T_1 \cap T_2) &= \frac{P(A \cap T_1 \cap T_2)}{P(T_1 \cap T_2)} \\ &= \frac{P(A) \cdot P(T_1 \cap T_2 \mid A)} {P(A \cap T_1 \cap T_2) + P(B \cap T_1 \cap T_2)} \\ &= \frac{P(A) P(T_1 \cap T_2 \mid A)} {P(A) \cdot P(T_1 \cap T_2 \mid A) + P(B) \cdot P(T_1 \cap T_2 \mid B)} \\ &= \frac{0.5 \cdot 0.1 \cdot 0.1}{0.5 \cdot 0.1 \cdot 0.1 + 0.5 \cdot 0.2 \cdot 0.2} = 0.2 \end{align*}\]
\[\begin{align} P(A \mid T_1 \cap T_2) &= \frac{P(A \cap T_1 \cap T_2)}{P(T_1 \cap T_2)} = \frac{P(A \cap T_1 \cap T_2)}{P(A \cap T_1 \cap T_2) + P(B \cap T_1 \cap T_2)} \\ &= \frac{P(T_1) \cdot P(A \mid T_1) \cdot P(T_2 \mid A \cap T_1)} {P(T_1) \left[ P(A \mid T_1) \cdot P(T_2 \mid A \cap T_1) + P(B \mid T_1) \cdot P(T_2 \mid B \cap T_1) \right]} \\ &= \frac{P(A \mid T_1) \cdot P(T_2 \mid A)} {P(A \mid T_1) \cdot P(T_2 \mid A) + P(B \mid T_1) \cdot P(T_2 \mid B)} \\ &= \frac{\frac13 \cdot 0.1}{\frac13 \cdot 0.1 + \frac23 \cdot 0.2} = 0.2 \end{align}\] Notice that this says that we can use the posterior after observing the first twins as a prior for the observation of the second twins (ignoring the first twins at this point because the information from that event is now included in the new prior). This works because \(P(T_2 \mid A \cap T_1) = P(T_2 \mid A)\), which follows from the assumption that \(P(T_1 \cap T_2 \mid A) = P(T_1 \mid A) \cdot P(T_2 \mid A)\) (ie, that for each species twins in the first birth are independent of twins in the second):
\[\begin{align*} P(T_2 \mid A \cap T_1) &= \frac{P(T_1 \cap A \cap T_2)} {P(A \cap T_1)} \\ &= \frac{P(A) \cdot P(T_1 \cap T_2 \mid A)} {P(A) \cdot P(T_1 \mid A)} \\ &= \frac{P(A) \cdot P(T_1 \mid A) \cdot P(T_2 \mid A)} {P(A) \cdot P(T_1 \mid A)} \\ &= P(T_2 \mid A) \end{align*}\] A similar statement holds with \(B\) in place of \(A\) by the same argument.
Let \(T_2\) be the event that the second birth is NOT twins, and the same algebra above leads to the following arithmetic:
\[\begin{align*} \frac{0.5 \cdot 0.1 \cdot 0.9}{0.5 \cdot 0.1 \cdot 0.9 + 0.5 \cdot 0.2 \cdot 0.8} &= 0.36 \\ \frac{\frac13 \cdot 0.9}{\frac13 \cdot 0.9 + \frac23 \cdot 0.8} &= 0.36 \end{align*}\]
Panda_Grid1 <-
expand.grid(
species = c("A", "B")
) %>%
mutate(
prior = c(0.5, 0.5),
likelihood = c(0.1, 0.2),
posterior = prior * likelihood,
posterior = posterior / sum(posterior)
)
Panda_Grid1
## species prior likelihood posterior
## 1 A 0.5 0.1 0.3333333
## 2 B 0.5 0.2 0.6666667
Panda_Grid2 <-
expand.grid(
species = c("A", "B")
) %>%
mutate(
prior = c(0.5, 0.5),
likelihood = c(0.1 * 0.1, 0.2 * 0.2),
posterior = prior * likelihood,
posterior = posterior / sum(posterior)
)
Panda_Grid2
## species prior likelihood posterior
## 1 A 0.5 0.01 0.2
## 2 B 0.5 0.04 0.8
Panda_Grid2a <-
Panda_Grid1 %>%
mutate(
prior = posterior,
posterior = prior * likelihood,
posterior = posterior / sum(posterior)
)
Panda_Grid2a
## species prior likelihood posterior
## 1 A 0.3333333 0.1 0.2
## 2 B 0.6666667 0.2 0.8
Panda_Grid3 <-
expand.grid(
species = c("A", "B")
) %>%
mutate(
prior = c(0.5, 0.5),
likelihood = c(0.1 * 0.9, 0.2 * 0.8),
posterior = prior * likelihood,
posterior = posterior / sum(posterior)
)
Panda_Grid3
## species prior likelihood posterior
## 1 A 0.5 0.09 0.36
## 2 B 0.5 0.16 0.64
Panda_Grid3a <-
Panda_Grid1 %>%
mutate(
prior = posterior,
likelihood = c(0.9, 0.8),
posterior = prior * likelihood,
posterior = posterior / sum(posterior)
)
Panda_Grid3a
## species prior likelihood posterior
## 1 A 0.3333333 0.9 0.36
## 2 B 0.6666667 0.8 0.64
More Pandas. You get more interested in pandas and learn that at your favorite zoo, 70% of pandas are species A and 30% are species B. You learn that one of the pandas has twins.
What is the probability that the panda is species A?
The same panda has a single panda the next year. Now what is the probability that the species is A?
Panda_Grid1 <-
expand.grid(
species = c("A", "B")
) %>%
mutate(
prior = c(0.7, 0.3),
likelihood = c(0.1, 0.2),
posterior = prior * likelihood,
posterior = posterior / sum(posterior)
)
Panda_Grid1
## species prior likelihood posterior
## 1 A 0.7 0.1 0.5384615
## 2 B 0.3 0.2 0.4615385
Panda_Grid2 <-
expand.grid(
species = c("A", "B")
) %>%
mutate(
prior = c(0.7, 0.3),
likelihood = c(0.1 * 0.9, 0.2 * 0.8),
posterior = prior * likelihood,
posterior = posterior / sum(posterior)
)
Panda_Grid2
## species prior likelihood posterior
## 1 A 0.7 0.09 0.5675676
## 2 B 0.3 0.16 0.4324324
Always show your work. If I can’t tell how you got your answer, you probably won’t get full credit.
For R code, follow the style guide. It is not optional.
Discussion should not be in R comments. Use text for that.
Take advantage of the ability to typeset mathematical notation in R Markdown.
Leave some margins (on all sides). And leave some space between problems.
Your goal should be that I can read your work easily. The more time I have to spend just figuring out what you are doing/saying, the less your work is worth. Communication is an important skill. Here’s a chance to practice.
Some comments on probability based on recent homework.
Use good notation.
If you create notation for events, random variables, etc., explain it. A good template is “Let _____ be ________”. The first blank is your notation and the second is what it means.
\(P(A \cap B)\), \(P(A \mid B)\), and \(P(B \mid A)\) mean differeng things. Be sure to use the correct one.
Use of the equally likely rule requires justification – how do you know the outcomes are equally likely?
Don’t make up probability rules. If we don’t have the rule you want to use, take a moment to show that the rule is valid. (If you can’t, perhaps it isn’t really a rule after all.)
Suppose we have a test with a 97% specificity and a 99% sensitivity just like in Section @ref(discrete-params). Now suppose that a random person is selected, has a first test that is positive, then is retested and has a second test that is negative.
Taking into account both tests, and assuming the results of the two tests are independent, what is the probability that the person has the disease?
Hint: We can use the the posterior after the first test as a prior for the second test. Be sure to keep as many decimal digits as possible (use R and don’t round intermediate results).
Note: In this problem we are assuming the the results of the two tests are independent, which might not be the case for some medical tests.
expand.grid(
status = c("healthy", "diseased")
) %>%
mutate(
prior1 = c(999/1000, 1/1000),
like1 = c(0.03, 0.99),
post1 = prior1 * like1,
post1 = post1/ sum(post1),
prior2 = post1,
like2 = c(0.97, 0.01),
post2 = prior2 * like2,
post2 = post2/ sum(post2)
)
## status prior1 like1 post1 prior2 like2 post2
## 1 healthy 0.999 0.03 0.96802326 0.96802326 0.97 0.9996595692
## 2 diseased 0.001 0.99 0.03197674 0.03197674 0.01 0.0003404308
Consider again the disease and diagnostic test of the previous exercise and Section @ref(discrete-params).
Suppose that a person selected at random from the population gets the test and it comes back negative. Compute the probability that the person has the disease.
How does the result compare with your answer in the previous exercise?
expand.grid(
status = c("healthy", "diseased")
) %>%
mutate(
prior1 = c(999/1000, 1/1000),
like1 = c(0.97, 0.01),
post1 = prior1 * like1,
post1 = post1/ sum(post1),
prior2 = post1,
like2 = c(0.03, 0.99),
post2 = prior2 * like2,
post2 = post2/ sum(post2)
)
## status prior1 like1 post1 prior2 like2 post2
## 1 healthy 0.999 0.97 9.999897e-01 9.999897e-01 0.03 0.9996595692
## 2 diseased 0.001 0.01 1.031949e-05 1.031949e-05 0.99 0.0003404308
MyBernGrid() so that it takes an argument specifying the probability for the HDI. Use it to create a plot showing 50% HDI for theta using a symmetric triangle prior and data consisting of 3 success and 5 failures.MyBernGrid2 <- function(
x, n, # x successes in n tries
prior = dunif, # function for prior
resolution = 1000, # number of intervals to use for grid
prob = 0.95, # probability for HDI
...) { # additional arguments for prior
Grid <-
expand.grid(
theta = seq(0, 1, length.out = resolution + 1)
) %>%
mutate( # saving only the normalized version of each
prior = prior(theta, ...),
prior = prior / sum(prior) * resolution,
likelihood = dbinom(x, n, theta),
likelihood = likelihood / sum(likelihood) * resolution,
posterior = prior * likelihood,
posterior = posterior / sum(posterior) * resolution
)
H <- hdi_from_grid(Grid, pars = "theta", prob = prob)
gf_line(prior ~ theta, data = Grid, color = ~"prior",
size = 1.15, alpha = 0.8) %>%
gf_line(likelihood ~ theta, data = Grid, color = ~"likelihood",
size = 1.15, alpha = 0.7) %>%
gf_line(posterior ~ theta, data = Grid, color = ~"posterior",
size = 1.15, alpha = 0.6) %>%
gf_pointrangeh(
height ~ mode + lo + hi, data = H,
color = "red", size = 1) %>%
gf_labs(title = "Prior/Likelihood/Posterior",
subtitle = paste("Data: n =", n, ", x =", x),
captions = paste0(100 * prob, "% HDI shown in red")) %>%
gf_refine(
scale_color_manual(
values = c(
"prior" = "forestgreen",
"likelihood" = "blue",
"posterior" = "red"),
breaks = c("prior", "likelihood", "posterior")
)) %>%
print()
invisible(Grid) # return the Grid, but don't show it
}
library(triangle)
MyBernGrid2(10, 20, prior = dtriangle, prob = 0.5, a = 0, b = 1, c = .80)
Let’s try the grid method for a model with two parameters. Suppose we want to estimate the mean and standard deviation of the heights of 21-year-old American men or women (your choice which group). First, lets get some data.
library(NHANES)
Men <- NHANES %>% filter(Gender == "male", Age == 21)
Women <- NHANES %>% filter(Gender == "female", Age == 21)
Likelihood Our model is that heights are normally distributed with mean \(\mu\) and standard deviation \(\sigma\):
Prior. For our prior, let’s use something informed just a little bit by what we know about people (we could do better with these priors, but that’s not our goal at the moment):
So our model is \[\begin{align*} \mathrm{Height} & \sim {\sf Norm}(\mu, \sigma) \\ \mu & \sim {\sf Unif}(150, 200) \\ \sigma & \sim {\sf Unif}(0, 20) \end{align*}\]
Grid. Use a grid that has 200-500 values for each parameter. Fill in the ?? below to create and update your grid.
Notes:
library(purrr)
Height_Grid <-
expand.grid(
mu = seq(??, ??, ?? = ??),
sigma = seq(??, ??, ?? = ??)
) %>%
filter(sigma != 0) %>% # remove sigma = 0
mutate(
prior = ??,
logprior = ??,
loglik =
map2_dbl(mu, sigma,
~ ?? ) # use .x for mu and .y for sigma
logpost = logprior + loglik,
posterior = exp(logpost)
)
Once you have created and updated your grid, you can visualize your posterior using a command like this (use gf_lims() if you want to zoom in a bit):
gf_tile(posterior ~ mu + sigma, data = Height_Grid) %>%
gf_contour(posterior ~ mu+ sigma, data = Height_Grid, color = "yellow")
Now answer the following questions.
Using the picture you just created, what would you say are credible values for \(\mu\) and for \(\sigma\)?
Now use hdi_from_grid() to compute a 90% highest density (posterior) intervals for each parameter. Do those make sense when you compare to the picture?
Create a plot showing the posterior distribution for \(\mu\) and a 90% HDI for \(\mu\). [Hint: how can you get a grid for the marginal distribution of \(\mu\) from the grid for \(\mu\) and \(\sigma\)?]
library(purrr)
##
## Attaching package: 'purrr'
## The following object is masked from 'package:mosaic':
##
## cross
Height_Grid <-
expand.grid(
mu = seq(150, 200, by = 0.1),
sigma = seq(0, 20, by = 0.1)
) %>%
filter(sigma > 0) %>% # can't have sigma be 0
mutate(
prior = 1,
logprior = 0,
loglik =
map2_dbl(mu, sigma,
~ sum(dnorm(Men$Height, mean = .x, sd = .y, log = TRUE))),
logpost = logprior + loglik,
posterior = exp(logpost),
posterior1 = posterior / sum(posterior, na.rm = TRUE) / (0.1 * 0.1)
)
gf_tile(posterior ~ mu + sigma, data = Height_Grid) %>%
gf_contour(posterior ~ mu+ sigma, data = Height_Grid, color = "yellow") %>%
gf_lims(x = c(172, 180), y = c(7, 9))
## Warning: Removed 98499 rows containing non-finite values (stat_contour).
## Warning: Removed 98499 rows containing missing values (geom_tile).
hdi_from_grid(Height_Grid, pars = c("mu", "sigma"), prob = 0.9)
## # A tibble: 2 x 7
## param lo hi prob height mode_height mode
## <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 mu 174. 177. 0.906 0.109 0.450 176.
## 2 sigma 6.9 9 0.914 0.151 0.630 7.8
Mu_Grid <-
Height_Grid %>%
group_by(mu) %>%
summarise(posterior = sum(posterior1, na.rm = TRUE)) %>%
mutate(posterior1 = posterior / sum(posterior) / 0.1)
gf_line(posterior1 ~ mu, data = Mu_Grid) %>%
gf_pointrangeh(height ~ mode + lo + hi,
data = hdi_from_grid(Height_Grid, pars = "mu", prob = 0.9),
color = "red")
library(purrr)
Height_Grid <-
expand.grid(
mu = seq(150, 200, by = 0.1),
sigma = seq(0, 20, by = 0.1)
) %>%
filter(sigma > 0) %>% # can't have sigma be 0
mutate(
prior = dtriangle(mu, 150, 200, 180) * dtriangle(sigma, 0, 20, 5),
logprior = 0,
loglik =
map2_dbl(mu, sigma,
~ sum(dnorm(Men$Height, mean = .x, sd = .y, log = TRUE))),
logpost = logprior + loglik,
posterior = exp(logpost),
posterior1 = posterior / sum(posterior, na.rm = TRUE) / (0.1 * 0.1)
)
hdi_from_grid(Height_Grid, pars = c("mu", "sigma"), prob = 0.9)
## # A tibble: 2 x 7
## param lo hi prob height mode_height mode
## <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 mu 174. 177. 0.906 0.109 0.450 176.
## 2 sigma 6.9 9 0.914 0.151 0.630 7.8
gf_tile(posterior ~ mu + sigma, data = Height_Grid) %>%
gf_contour(posterior ~ mu+ sigma, data = Height_Grid, color = "yellow") %>%
gf_lims(x = c(172, 180), y = c(7, 9))
## Warning: Removed 98499 rows containing non-finite values (stat_contour).
## Warning: Removed 98499 rows containing missing values (geom_tile).
Mu_Grid <-
Height_Grid %>%
group_by(mu) %>%
summarise(posterior = sum(posterior1, na.rm = TRUE)) %>%
mutate(posterior1 = posterior / sum(posterior) / 0.1)
gf_line(posterior1 ~ mu, data = Mu_Grid) %>%
gf_pointrangeh(height ~ mode + lo + hi,
data = hdi_from_grid(Height_Grid, pars = "mu", prob = 0.9),
color = "red")
Bob plays basketball.
He shoots 70% of his shots from 2-point range and makes 48% of these shots. He shoots 30% of his shots from 3-point range and makes 32% of these shots. Joe just made a shot. What is the probability that it was a 3-point shot?
Do this problem twice. The first time, use probability rules, carefully denoting the probabilities involved. (For any number that appears, I should be able to tell from your notation where it came from.) The second time, use the “grid method”.
Here’s the grid method.
expand.grid(
shot = c(2, 3)
) %>%
mutate(
prior = c(0.7, 0.3),
likelihood = c(0.48, 0.32),
posterior = prior * likelihood,
posterior = posterior / sum(posterior)
)
## shot prior likelihood posterior
## 1 2 0.7 0.48 0.7777778
## 2 3 0.3 0.32 0.2222222
Alice has 3 hats labeled with the letters H, A, and T. In each hat are marbles of various colors.
| Hat | White marbles | Red marbles | Yellow marbles |
|---|---|---|---|
| H | 4 | 10 | 6 |
| A | 6 | 12 | 2 |
| T | 5 | 3 | 2 |
Alice randomly selects a hat by flipping two coins. If both are heads, she chooses hat H. If both are tails, she chooses hat T. If there is one head and one tail, she chooses hat A. Once that hat is selected, she draws out two marbles.
If the two marbles are both white, what is the probability that the hat was hat A?
If there is one red marble and one yellow marble, what is the probability that the hat was hat A?
If the two marbles are the same color, what is the probability that the hat was hat A?
expand.grid(
hat = c("H", "A", "T")
) %>%
mutate(
prior = c(1/4, 1/2, 1/4),
l1 = c(4/20 * 3/19, 6/20 * 5/19, 5/10 * 4/9),
l2 = c(2 * 10/20 * 6/19, 2 * 12/20 * 2/19, 2 * 3/10 * 2/9),
l3 = c(4/20 * 3/19 + 10/20 * 9/19 + 6/20 * 5/19,
6/20 * 5/19 + 12/20 * 11/19 + 2/20 * 1/19,
5/10 * 4/9 + 3/10 * 2/9 + 2/10 * 1/9),
post1 = prior * l1 / sum(prior * l1),
post2 = prior * l2 / sum(prior * l2),
post3 = prior * l3 / sum(prior * l3)
) %>%
filter(hat == "A")
## hat prior l1 l2 l3 post1 post2 post3
## 1 A 0.5 0.07894737 0.1263158 0.4315789 0.3835227 0.36 0.567256
Suppose we have a coin that we know comes from a magic-trick store, and therefore we believe that the coin is strongly biased either usually to come up heads or usually to come up tails, but we don’t know which.
Express this belief as a beta prior. That is, find shape parameters that lead to a beta distribution that corresponds to this belief.
Now we flip the coin 5 times and it comes up heads in 4 of the 5 flips. What is the posterior distribution?
Use quick_bern_beta() or a similar function of your own creation to show the prior and posterior graphically.
We need a beta distribution with both shape parameters less than 1. For example:
# prior probability that heads happens in <= 10% of flips
# about 1/3 <= 10%, 1/3 >= 90%, and 1/3 between 10% and 90%
pbeta(0.10 , shape1 = 0.2, shape2 = 0.2)
## [1] 0.3366898
gf_dist("beta", shape1 = 0.2, shape2 = 0.2)
The posterior in this case would be Beta(4.2, 1.2)
library(CalvinBayes)
quick_bern_beta(x = 4, n = 5, a = 0.2, b = 0.2) %>%
gf_dist("beta", shape1 = 4.2, shape2 = 1.2, color = "red",
linetype = "dashed") %>%
gf_lims(y = c(0, 10))
Show that if \(\alpha, \beta > 1\), then the mode of a Beta(\(\alpha\), \(\beta\)) distribution is \(\displaystyle {\frac{\alpha -1}{\alpha +\beta -2}}\).
Hint: What would you do if you were in Calculus I?
Find the mode by finding where the deriviative of the pdf is 0. You can make things a bit easier if you take the log of the pdf first.
Suppose we have a coin that we know comes from a magic-trick store, and therefore we believe that the coin is strongly biased either usually to come up heads or usually to come up tails, but we don’t know which.
Express this belief as a beta prior. That is, find shape parameters that lead to a beta distribution that corresponds to this belief.
Now we flip the coin 5 times and it comes up heads in 4 of the 5 flips. What is the posterior distribution?
Use quick_bern_beta() or a similar function of your own creation to show the prior and posterior graphically.
The prior should choose both shape parameters to be less than 0.
If we want to the prior to say that it is equally likely that the bias is less than 10%, more than 90% or between 10 and 90%, then {Beta}(.2, .2) is about right. Since the prior gets quite large right near 0 and 1, choosing custom limits for the y axis makes for a better picture.
pbeta(0.10, .2, .2)
## [1] 0.3366898
quick_bern_beta(4, 5, a = 0.2, b= 0.2) %>% gf_lims(y = c(0, 5))
Suppose we estimate a proportion \(\theta\) using a \({\sf Beta}(10, 10)\) prior and a observe 26 successes and 48 failures.
qbeta()]Most of this is straghtforward from the formulas. Part d is tricky to do just right.
a. Beta(36, 58)
b. mean is 36/(36 + 58) = 0.3829787
c. mode is 35/(35 + 57) = 0.3804348
# this is close
almost <- qbeta(c(0.05, 0.95), 36, 58); almost
## [1] 0.3023307 0.4664734
# but heights are not exactly the same
dbeta(almost, 36, 58)
## [1] 2.219110 1.986591
# so we need to shift the interval to the left (lower density at left end and
# raising it on the right end)
# this funciton is 0 when we find the correct percentage on the left side.
left_prob <- function(p) {
dbeta(qbeta(p, 36, 58), 36, 58) -
dbeta(qbeta(.90 + p, 36, 58), 36, 58)
}
uniroot(left_prob, c(0,0.05))
## $root
## [1] 0.04647647
##
## $f.root
## [1] 9.587978e-05
##
## $iter
## [1] 3
##
## $init.it
## [1] NA
##
## $estim.prec
## [1] 6.103516e-05
left_p <- uniroot(left_prob, c(0,0.05)) %>% value()
HDI <- qbeta(c(left_p, .90 + left_p), 36, 58); HDI
## [1] 0.3006980 0.4647477
pbeta(HDI, 36, 58) %>% diff()
## [1] 0.9
gf_dist("beta", shape1 = 36, shape2 = 58) %>%
gf_segment(
dbeta(HDI[1], 36, 58) + dbeta(HDI[2], 36, 58) ~ HDI[1] + HDI[2],
color = "red") %>%
gf_segment(
dbeta(almost[1], 36, 58) + dbeta(almost[2], 36, 58) ~ almost[1] + almost[2],
color = "blue")
Suppose a state-wide election is approaching, and you are interested in knowing whether the general population prefers the democrat or the republican. There is a just-published poll in the newspaper, which states that of 100 randomly sampled people, 58 preferred the republican and the remainder preferred the democrat.
Suppose that before the newspaper poll, your prior belief was a uniform distribution. What is the 95% HDI on your beliefs after learning of the newspaper poll results?
Based on what you know about elections, why is a uniform prior not a great choice? Repeat part (a) with a prior the conforms better to what you know about elections. How much does the change of prior affect the 95% HDI?
You find another poll conducted by a different news organization In this second poll, 56 of 100 people preferred the republican. Assuming that peoples’ opinions have not changed between polls, what is the 95% HDI on the posterior taking both polls into account. Make it clear which prior you are using.
Based on this data (and your choice of prior, and assuming public opinion doesn’t change between the time of the polls and election day), what is the probability that the republican will win the election.
You have some choices in this problem, so there isn’t one correct answer. Things I especially looked for were:
Your justification for your prior. It should indicate (a) that you understand what the prior is and (b) that you have chosen something that relfects how elections turn out.
The posterior probability of a republican winning is the probability in the posterior distribution that \(\theta > 0.5\). For some reason, several of you reported the mode, which has almost nothing to do with this probability.
# part d:
# Use Beta(10, 10) prior
# Beta(50, 50) or Beta(100, 100) might be even better.
# Then the posterior is Beta(10 + 58 + 56, 10 + 42 + 44)
# Post. Prob. theta > 50% is
1- pbeta(0.5, 10 + 58 + 56, 10 + 42 + 44)
## [1] 0.9708828
In this exercise, you will see how the Metropolis algorithm operates with a multimodal prior.
Define the function \(p(\theta) = (cos(4 \pi \theta) + 1)^2/1.5\) in R.
Use gf_function() to plot \(p(\theta)\) on the interval from 0 to 1. [Hint: Use the xlim argument.]
Use integrate() to confirm that \(p\) is a pdf.
Run metro_bern() with \(p\) as your prior, with no data (x = 0, n = 0), and with size = 0.2. Plot the posterior distribution of \(\theta\) and explain why it looks the way it does.
Now create a posterior histogram or density plot using x = 2, n = 3. Do the results look reasonable? Explain.
Now create a posterior histogram or density plot with x = 1, n = 3, and size = 0.02. Comment on how this compares to plot you made in the previous item.
Repeat the previous two items but with start = 0.15 and start = 0.95. How does this help explain what is happening? Why is it good practice to run MCMC algorithms with several different starting values as part of the diagnositc process?
How would looking at trace plots from multiple starting points help you detect this problem? (What would the trace plots look like when things are good? What would they look like when things are bad?)
With no data, the likelihood is 1 for all parameter values. (The probability of seeing nothing if you collect no data is 100% no matter what the parameter values are.) So the posterior is the prior in this situation. This gives us a way to sample from the prior distribution.
Note: Because of the way that gf_dens() smooths, and because the prior is at a peak right near the borders, a density plot isn’t as good as a histora in this situation.
set.seed(12345)
p <- function(theta) (cos(4 * pi * theta) + 1)^2/1.5
integrate(p, 0, 1)
## 1 with absolute error < 1.2e-10
Metro_p <- metro_bern(0, 0, size = 0.2, prior = p)
gf_dhistogram(~ theta, data = Metro_p, bins = 100) %>%
gf_dens(color = ~"kernel density estimate (kde)") %>%
gf_function(p, color = ~"p")
M1a <- metro_bern(x = 2, n = 3, size = 0.2, prior = p, start = 0.5)
M2a <- metro_bern(x = 2, n = 3, size = 0.2, prior = p, start = 0.15)
M3a <- metro_bern(x = 2, n = 3, size = 0.2, prior = p, start = 0.95)
gf_dens(~ theta, data = M1a, color = ~"1: start = 0.5; size = 0.2") %>%
gf_dens(~ theta, data = M2a, color = ~"2: start = 0.15; size = 0.2") %>%
gf_dens(~ theta, data = M3a, color = ~"3: start = 0.95; size = 0.2")
M1b <- metro_bern(x = 2, n = 3, size = 0.02, prior = p, start = 0.5)
M2b <- metro_bern(x = 2, n = 3, size = 0.02, prior = p, start = 0.15)
M3b <- metro_bern(x = 2, n = 3, size = 0.02, prior = p, start = 0.95)
gf_dens(~ theta, data = M1b, color = ~"1: start = 0.5; size = 0.02") %>%
gf_dens(~ theta, data = M2b, color = ~"2: start = 0.15; size = 0.02") %>%
gf_dens(~ theta, data = M3b, color = ~"3: start = 0.95; size = 0.02")
From the above we see that with the smaller step size, the “posterior” samples are highly influenced by the starting point (and so are not actually representative of the posterior). That’s a bad thing.
With the larger step size, we get essentially the same posterior sampling regardless of starting point. That’s a good thing. The resulting distibution has increased credibility in the modes on the right and decreased credibility in the left mode because there were more heads than tails in the data.
This example is a bit extreme, but take a look at the traceplots.
step = 0.2 (good):
gf_line(theta ~ step, data = M1a, color = ~"Chain 1", alpha = 0.5) %>%
gf_line(theta ~ step, data = M2a, color = ~"Chain 2", alpha = 0.5) %>%
gf_line(theta ~ step, data = M3a, color = ~"Chain 3", alpha = 0.5)
size = 0.02: bad
gf_line(theta ~ step, data = M1b, color = ~"Chain 1", alpha = 0.5) %>%
gf_line(theta ~ step, data = M2b, color = ~"Chain 2", alpha = 0.5) %>%
gf_line(theta ~ step, data = M3b, color = ~"Chain 3", alpha = 0.5)
bad features:
Note: Alone, chains 1 and 3 look fine, it isn’t until we compare the chains that we see the problem (unless you already know what the posterior should look like).
Sampling from priors. You want to know who is the better free throw shooter, Alice or Bob. You decide to have each shoot a number of shots and record their makes and misses. You are primarily interested in the difference between their free throw shooting proportions (\(\theta_2 - \theta_1\)), and you are curious to know how your choice of priors for \(\theta_1\) and \(\theta_2\) affects the prior for \(\theta_2 - \theta_1\). For each situation below, use JAGS to sample from the prior distribution for \(\theta_2 - \theta_1\) and create a density plot. (In each case, assume the priors for \(\theta_1\) and \(\theta_2\) are independent.)
Both priors are uniform. What distribution do you think the prior for \(\theta_2 - \theta_1\) is?
Both priors are \({\sf Beta}(0.2, 0.2)\). Explain why the prior for \(\theta_2 - \theta_1\) looks the way it does.
Hint: Don’t forget to set DIC = FALSE when sampling from the prior.
model_a <- function(){
for (i in 1:Nobs) {
hit[i] ~ dbern(theta[shooter[i]])
}
for (s in 1:Nsubj) {
theta[s] ~ dbeta(1, 1)
}
delta <- theta[2] - theta[1]
}
model_b <- function(){
for (i in 1:Nobs) {
hit[i] ~ dbern(theta[shooter[i]])
}
for (s in 1:Nsubj) {
theta[s] ~ dbeta(0.2, 0.2)
}
delta <- theta[2] - theta[1]
}
jags_a <-
jags(
model = model_a,
data = list(Nobs = 0, Nsubj = 2),
parameters.to.save = c("theta", "delta"),
DIC = FALSE
)
## module glm loaded
## module dic loaded
jags_b <-
jags(
model = model_b,
data = list(Nobs = 0, Nsubj = 2),
parameters.to.save = c("theta", "delta"),
DIC = FALSE
)
mcmc_areas(as.mcmc(jags_a), prob = 0) %>% gf_labs(title = "Model A")
mcmc_areas(as.mcmc(jags_b), prob = 0) %>% gf_labs(title = "Model B")
The first distribution looks like a triangle distribution centered at 0 (and it is).
The second distribution has peaks at 0 and near -1 and 1 because each \(\theta_s\) is likely to be near 0 or 1. If both are near the same end, we get a difference near 0. But if they are on opposite ends, we get a difference near 1 or -1. The larger bump is near 0 because there are two ways to do get into that bump (both near 0 or both near 1).
Now suppose that Alice makes 25 out of 30 shots and Bob makes 18 out of 32. Is this enough evidence to conclude that Alice is the better shooter? Do this two ways. In each case, use a \({\sf Beta}(4, 2)\) prior for both \(\theta_1\) and \(\theta_2\).
Create data and use a model like the one used elsewhere in this chapter. [Hint: rep() is handy for creating a bunch of values that are the same.]
Instead of using dbern(), use dbin() (JAGS version of the binomial distribution). This should allow you to get by with simpler data that consists only of the numbers 25, 30, 18, and 32. Note: the order of arguments for dbin() is probability first, then number of trials. This is reversed from the order in R.
DataList <-
list(
Nobs = 30 + 32, Nsubj = 2,
hit = c(rep(1,25), rep(0,5), rep(1,18), rep(0,14)),
shooter = c(rep(1, 30), rep(2, 32))
)
model_c <- function(){
for (i in 1:Nobs) {
hit[i] ~ dbern(theta[shooter[i]])
}
for (s in 1:Nsubj) {
theta[s] ~ dbeta(4, 2)
}
delta <- theta[1] - theta[2]
}
jags_c <-
jags(
model = model_c,
data = DataList,
parameters.to.save = c("theta", "delta"),
DIC = FALSE
)
mcmc_areas(as.mcmc(jags_c), prob = 0.9) %>% gf_labs("Model C")
prop( ~ (delta > 0), data = posterior(jags_c))
Nearly all of the posterior probability for \(\theta_1 - \theta_2\) is positive, so we have compelling evidence that Alice is a better shooter.
DataList0 <-
list(
Nobs = 0, Nsubj = 2,
hit = c(rep(1,25), rep(0,5), rep(1,18), rep(0,14)),
shooter = c(rep(1, 30), rep(2, 32))
)
jags_c <-
jags(
model = model_c,
data = DataList0,
parameters.to.save = c("theta", "delta"),
DIC = FALSE
)
## Compiling model graph
## Resolving undeclared variables
## Allocating nodes
## Graph information:
## Observed stochastic nodes: 0
## Unobserved stochastic nodes: 2
## Total graph size: 131
##
## Initializing model
mcmc_areas(as.mcmc(jags_c)) %>% gf_labs("Model C -- Priors")
The prior is agnostic about who is better (centered at 0) and quite broad, so many differences are possible. Because of the peak at 0, we will hedge our beliefs a bit toward believing Alice and Bob are more similar.
Since we know the priors on \(\theta\) are beta distributions and we see some odd wiggles in the our pictures, we could increase the posterior (effective) sample size to smooth things out a bit more. But that won’t change the overall story here.
If shooting is easier, we would expect both to do better. If it is more difficult, we would expect both to do worse. If we don’t know how hard it is to make a free throw, we might at least expect them to either both do better or both do worse.
Consider the model below.
Create a plot of the prior distribution of theat1 and theta2. A scatter plot with overlayed density plot works well. How does this prior compare to the priors in problem 1. (Hint: Don’t forget to use DIC = FALSE when sampling from the prior.)
Fit the model to the Alice and Bob data. How does this choice of prior change things?
diff_model <- function() {
z1 ~ dbin(theta1, n1)
z2 ~ dbin(theta2, n2)
theta2 <- theta1 + delta
theta1 ~ dbeta(4, 2)
delta ~ dnorm(0, 400) # Normal with mean 0 and sd = 0.05
}
diff_jags <-
jags.parallel(
model = diff_model,
data = list(n1 = 30, z1 = 25, n2 = 32, z2 = 18),
parameters.to.save = c("theta1", "theta2", "delta"),
n.iter = 5000,
n.chains = 4
)diff_model <- function() {
z1 ~ dbin(theta1, n1)
z2 ~ dbin(theta2, n2)
theta2 <- theta1 + delta
theta1 ~ dbeta(4, 2)
delta ~ dnorm(0, 400) # Normal with mean 0 and sd = 0.05
}
diff_jags0 <-
jags.parallel(
model = diff_model,
data = list(n1 = 0, z1 = 0, n2 = 0, z2 = 0),
parameters.to.save = c("theta1", "theta2", "delta"),
n.iter = 5000,
n.chains = 4,
DIC = FALSE
)
diff_jags <-
jags.parallel(
model = diff_model,
data = list(n1 = 30, z1 = 25, n2 = 32, z2 = 18),
parameters.to.save = c("theta1", "theta2", "delta"),
n.iter = 5000,
n.chains = 4,
DIC = FALSE # suppress deviance calculation
)
mcmc_pairs(as.mcmc(diff_jags0), off_diag_args = list(size = 0.3, alpha = 0.5))
gf_point(theta2 ~ theta1, alpha = 0.2, data = posterior(diff_jags0)) %>%
gf_density2d() %>%
gf_labs(title = "Prior distribution")
mcmc_areas(as.mcmc(diff_jags), prob = 0.9) %>% gf_labs(title = "Posterior")
With this prior, which held strongly that the two players were similar, we are not convinced that Alice is better than Bob.
One could, of course, consider a prior somewhere between these two options – one where the correlation between \(\theta_1\) and \(\theta_2\) is not as strong, but also not 0. This model could easily be modifed to do that by using a wider distribution for the difference between the two proportions.
Prior <-
expand.grid(
theta1 = seq(0, 1, by = 0.001),
delta = seq(-1, 1, by = 0.001)
) %>%
mutate(
theta2 = theta1 + delta,
prior = dbeta(theta1, 4, 2) * dnorm(delta, 0, 0.05),
likelihood = 1,
posterior = prior * likelihood
) %>%
resample(prob = .$posterior, size = 5000)
Posterior <-
expand.grid(
theta1 = seq(0, 1, by = 0.001),
delta = seq(-1, 1, by = 0.001)
) %>%
mutate(
theta2 = theta1 + delta,
prior = dbeta(theta1, 4, 2) * dnorm(delta, 0, 0.05) *
(theta2 > 0) * (theta2 < 1),
likelihood = ifelse(
theta2 > 0 & theta2 < 1,
dbinom(25, 30, theta1) * dbinom(18, 32, theta2),
0
),
posterior = prior * likelihood
) %>%
resample(prob = .$posterior, size = 5000)
## Warning in dbinom(18, 32, theta2): NaNs produced
library(GGally)
##
## Attaching package: 'GGally'
## The following object is masked from 'package:dplyr':
##
## nasa
ggpairs(
Prior[, c("delta", "theta1", "theta2")],
upper = list(continuous = wrap("density", alpha = 0.5)),
lower = list(continuous = wrap("points", alpha = 0.1))
)
ggpairs(
Posterior[, c("delta", "theta1", "theta2")],
upper = list(continuous = wrap("density", alpha = 0.5)),
lower = list(continuous = wrap("points", alpha = 0.1))
)